AITopics | attribution method

Collaborating Authors

attribution method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e1ebda145808ca45774993fb67314894-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-22-2026, 23:48:07 GMT

ARelated Work1 Data Attribution Evaluation: Given recent developments in data attribution methods for LLMs,2 past works in evaluating these methods fall two major categories: leave-out-out and task-based3 evaluation. Leave-one-out evaluation measures the correlation between the data attribution method4 scores and model-retraining, which can also be approximated using linear datamodeling score [26].5 In task-based evaluation, the data attribution method is evaluated based on its application towards6 downstream task, such as noisy label detection, counterfactual evaluation [3, 13].7 Training Data Selection: Selecting high-quality training data selection is important for efficient8 learning in LLMs. Common approaches to data selection relies on heuristic filtering, such as de-9 duplication and lexicon-filtering, [34], or semantic rating [48, 52]. Recent works have applied data10 attribution methods towards data selection in LLMs in both pre-training [56, 59, 15] and post-training11 [45, 53, 31]. These data attribution methods are dynamic and model-aware - increasing the frequency12 of performing selection is one way to take greater account for group influence, where online selection13 at each training step is most fine-grained [49].14 Toxicity/Bias Detection: Detecting and mitigating toxic/biased LLMs outputs is a crucial for safe15 deployment in real-word settings. Existing methods for detecting toxicity/bias in LLMs commonly16 include online API tools 1 [37] or LLM-classifiers [58, 21, 16, 27]. Factual Attribution: Identifying training examples which causes LLMs to generate specific factual20 statements is an important application of data attribution as AI tools are becoming increasingly21 common. Apart from baseline retrieval methods that leverage lexical/semantic similarity like BM2522 [48], Rep Sim [44] and Gecko [33], recent works have explored the use of data attribution in tracing23 factual knowledge in both pre-training[6] and post-training [42, 2].24 We provide below descriptions to the data attribution methods and non-attribution baselines evaluated26 in this work. Note that in our work, we consider non-attribution baselines as methods that do not27 estimate the impact of training samples on models, as detailed in [19].28 Rep-Sim [44]: (Non-attribution baseline) Rep-Sim computes the cosine similarity between last29 token last layer hidden states of training and reference examples. It is more efficient compared with30 gradient-based data attribution methods. BM25 [48]: (Non-attribution baseline) BM25 is a classic information retrieval algorithm that ranks33 training samples by lexical overlap with the query. It is significantly more efficient compared with34 gradient-based data attribution methods.35

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.46)
Asia > Middle East (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Government (0.93)
Media (0.93)
Information Technology (0.93)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

5ce3a49415f78db65a714b4f05c62f4e-Paper-Conference.pdf

Neural Information Processing SystemsJun-17-2026, 14:31:57 GMT

Data attribution for text-to-image models aims to identify the training images that most involv significantly e considerable influenced computational a generated resources output.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.68)
Government (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

OrdShap: Feature Position Importance for Sequential Black-Box Models

Neural Information Processing SystemsJun-13-2026, 22:22:09 GMT

Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering -- conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Bergantiños values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Fast Data Attribution for Text-to-Image Models

Neural Information Processing SystemsJun-12-2026, 09:26:58 GMT

Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable computational resources for each query, making them impractical for real-world applications. We propose a novel approach for scalable and efficient data attribution. Our key idea is to distill a slow, unlearning-based attribution method to a feature embedding space for efficient retrieval of highly influential training images. During deployment, combined with efficient indexing and search methods, our method successfully finds highly influential images without running expensive attribution algorithms. We show extensive results on both medium-scale models trained on MSCOCO and large-scale Stable Diffusion models trained on LAION, demonstrating that our method can achieve better or competitive performance in a few seconds, faster than existing methods by 2,500x - 400,000x. Our work represents a meaningful step towards the large-scale application of data attribution methods on real-world models such as Stable Diffusion.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Correcting misinterpretations of additive models

Neural Information Processing SystemsJun-11-2026, 12:05:10 GMT

Correct model interpretation in high-stakes settings is critical, yet both post-hoc feature attribution methods and so-called intrinsically interpretable models can systematically attribute false-positive importance to non-informative features such as suppressor variables. Specifically, both linear models and their powerful non-linear generalisation such as General Additive Models (GAMs) are susceptible to spurious attributions to suppressors. We present a principled generalisation of activation patterns - originally developed to make linear models interpretable - to additive models, correctly rejecting suppressor effects for non-linear features. This yields PatternGAM, an importance attribution method based on univariate generative surrogate models for the broad family of additive models, and PatternQLR for polynomial models. Empirical evaluations on the XAI-TRIS benchmark with a novel false-negative invariant formulation of the earth mover's distance accuracy metric demonstrates significant improvements over popular feature attribution methods and the traditional interpretation of additive models. Finally, real-world case studies on the COMPAS and MIMIC-IV datasets provide new insights into the role of specific features by disentangling genuine target-related information from suppression effects that would mislead conventional GAM interpretations.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Attributions All the Way Down? The Metagame of Interpretability

Baniecki, Hubert, Biecek, Przemyslaw, Fumagalli, Fabian

arXiv.org Machine LearningMay-8-2026

We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $ϕ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of feature $i$, denoted as meta-attribution $φ_{j \to i}(f)$, by treating the attribution method itself as a cooperative game and computing its Shapley value. Theoretically, we prove that attributions hierarchically decompose into meta-attributions, and establish these as directional extensions of existing interaction indices. Empirically, we demonstrate that the metagame delivers insights across diverse interpretability applications: (i) quantifying token interactions in instruction-tuned language models, (ii) explaining cross-modal similarity in vision-language encoders, and (iii) interpreting text-to-image concepts in multimodal diffusion transformers.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2605.06295

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment (0.93)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

Add feedback

Explanation of Dynamic Physical Field Predictions using WassersteinGrad: Application to Autoregressive Weather Forecasting

Essafouri, Younes, Raynaud, Laure, Drozda, Luciano, Risser, Laurent

arXiv.org Machine LearningApr-27-2026

As the demand to integrate Artificial Intelligence into high-stakes environments continues to grow, explaining the reasoning behind neural-network predictions has shifted from a theoretical curiosity to a strict operational requirement. Our work is motivated by the explanations of autoregressive neural predictions on dynamic physical fields, as in weather forecasting. Gradient-based feature attribution methods are widely used to explain the predictions on such data, in particular due to their scalability to high-dimensional inputs. It is also interesting to remark that gradient-based techniques such as SmoothGrad are now standard on images to robustify the explanations using pointwise averages of the attribution maps obtained from several noised inputs. Our goal is to efficiently adapt this aggregation strategy to dynamic physical fields. To do so, our first contribution is to identify a fundamental failure mode when averaging perturbed attribution maps on dynamic physical fields: stochastic input perturbations do not induce stationary amplitude noise in attribution maps, but instead cause a geometric displacement of the attributions. Consequently, pointwise averaging blurs these spatially misaligned features. To tackle this issue, we introduce WassersteinGrad, which extracts a geometric consensus of perturbed attribution maps by computing their entropic Wasserstein barycenter. The results, obtained on regional weather data and a meteorologist-validated neural model, demonstrate promising explainability properties of WassersteinGrad over gradient-based baselines across both single-step and autoregressive forecasting settings.

artificial intelligence, displacement, machine learning, (19 more...)

arXiv.org Machine Learning

2604.2258

Country: Europe > France (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

1305_making_sense_of_dependence_eff

Paul Novello

Neural Information Processing SystemsApr-24-2026, 22:45:48 GMT

In this part, we state the orthogonal decomposition Property, motivate its importance with a pedagogical example, and finally prove Proposition 1, which enables the decomposition property in the context of HSIC attribution method. A.1 Orthogonal Decomposition Property Let x = {x1,..., xn}2Xn be a set of n univariate random input variables. For any subset A = {l1,...,l |A|} { 1,...,n}, we denote xA =( xl1,..., xl|A|) the vector of input variables with indices in A. Let y the random output variable defined by y = f(x), F the RKHS defined by the kernel kA: X|A|! R and G the RKHS defined by the kernel l: Y! R. In [11], the author shows that for any choice of kernel l, if we respect some constraints on the kernel kA, we can construct indices HSIC (xA,y) that satisfy the following decomposition property. The constraints on the kernel kA are detailed in the main document and in the last section of this appendix.

artificial intelligence, hsic, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure

Neural Information Processing SystemsApr-24-2026, 22:45:43 GMT

This paper presents a new efficient black-box attribution method built on HilbertSchmidt Independence Criterion (HSIC). Based on Reproducing Kernel Hilbert Spaces (RKHS), HSIC measures the dependence between regions of an input image and the output of a model using the kernel embedding of their distributions. It thus provides explanations enriched by RKHS representation capabilities. HSIC can be estimated very efficiently, significantly reducing the computational cost compared to other black-box attribution methods. Our experiments show that HSIC is up to 8 times faster than the previous best black-box attribution methods while being as faithful. Indeed, we improve or match the state-of-the-art of both black-box and white-box attribution methods for several fidelity metrics on Imagenet with various recent model architectures. Importantly, we show that these advances can be transposed to efficiently and faithfully explain object detection models such as YOLOv4. Finally, we extend the traditional attribution methods by proposing a new kernel enabling an ANOVA-like orthogonal decomposition of importance scores based on HSIC, allowing us to evaluate not only the importance of each image patch but also the importance of their pairwise interactions.

attribution method, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.67)

Industry: Transportation > Air (1.00)

Technology: